Search CORE

382 research outputs found

Generative Image Modeling Using Spatial LSTMs

Author: Bethge Matthias
Theis Lucas
Publication venue
Publication date: 18/09/2015
Field of study

Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multi-dimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Evaluating Models of Scanpath Prediction

Author: Bethge Matthias
Kümmerer Matthias
Publication venue: 'Purdue University (bepress)'
Publication date: 16/05/2023
Field of study

Purdue E-Pubs

Using Deep Features to Predict Where People Look

Author: Bethge Matthias
Kümmerer Matthias
Publication venue: 'Purdue University (bepress)'
Publication date: 11/05/2016
Field of study

When free-viewing scenes, the first few fixations of human observers are driven in part by bottom-up attention. We seek to characterize this process by extracting all information from images that can be used to predict fixation densities (Kuemmerer et al, PNAS, 2015). If we ignore time and observer identity, the average amount of information is slightly larger than 2 bits per image for the MIT 1003 dataset. The minimum amount of information is 0.3 bits and the maximum 5.2 bits. Before the rise of deep neural networks the best models were able to capture 1/3 of this information on average. We developed new saliency algorithms based on high-performing convolutional neural networks such as AlexNet or VGG-19 that have been shown to provide generally useful representations of natural images. Using a transfer learning paradigm we first developed DeepGaze I based on AlexNet that captures 56% of the total information. Subsequently, we developed DeepGaze II based on VGG-19 that captures 88% and is state-of-the-art on the MIT 300 benchmark dataset. We will show best case and worst case examples as well as feature selection methods to visualize which structures in the image are critical for predicting fixation densities

Purdue E-Pubs